A weekly summary of the database concepts we learned at class.
Go to Home, Design, or Implementation instead.
Week 1 - (November 23, 2010 and November 25, 2010) - Introduction to Database Systems
Information Resource Management (IRM) - the database environment
Data and information are resources as precious as time and money. IRM as a concept guides the use of a company's information similar in fashion to physical resources like manpower and finances.
Resources flow into and out of a company in a typical input-output manner. A student archive may take in a set of students' names, classes, and grades, and output various reports based on these data.
Physical resources such as personnel becomes difficult to monitor as a business grows, so conceptual resources (which is the relevant data and information) are used.
Both types of resources can be managed the same way.
Data management consists of: data acquisition, protection, quality assurance and removal, and organizational commitment is required to perform these basic tasks.
What is a database?
Data ≠ information.
Data -> raw facts, with no inherent meaning. A name is just a noun until it is connected to a person, object or place, and then it becomes...
Information -> processed data suitable for human interpretation.
To convert data into useful information, after acquisition, the data is stored, manipulated, retrieved, and distributed according to need. To support this functionality, a database is employed. A database is a "shared collection of logically related data, designed to meet the information needs to multiple users in an organization." Two generic database architectures are used:
Centralized - all data is found in a single site, and may be accessed via networks or other communication protocol. While accessing and updating data is simplified, it is a point of failure, depending on the availability of the resource at the central location.
Distributed - the database is logically just one unit, but the actual data it holds are physically spread across many computers. Physically managing this type of database entails some complexity.
Homogeneous Database - a type of distributed database wherein the technology (computer OS, data models, data management systems, data definitions/formats) used on the separate systems is the same or is similar to others used.
Heterogeneous Database - the technologies may vary.
Information System Architecture (ISA) - developing a database system
The Information System Architecture is a framework that is the basis for storage, planning, development and use of information a business uses.
The above table (as seen in "Database Systems") promptly summarizes the ISA framework. The columns, composed of Data, Processes, and Network, are the major components of the system. Data is the "what" of the information system: the entities and relationships. Processes are the steps that produce output from given data, representing the "how" of the system. The Network describes "where" data is stored and computations are performed.
The rows represent the architectural layers of an organization's information system, namely the business scope (an overview), business model (definition of entities and relationships), information system model (detailed business data, process flow, and network definitions), technology model (conversion from model to design), technology definition (conversion from model to actual statements that generate the actual information system), and the information system itself (manages, operates and uses the completed system).
Information Engineering Methodology (IEM)
A framework provides necessary steps up to completing a model. Methodology then steps in, presenting a series of steps to accomplish a design goal. Each methodology supports different modelling tools (e.g., Computer-Aided Software Engineering or CASE) and disciplines.
IEM is one such formal methodology; it emphasizes a top-down, data-driven approach. It is also compatible with the ISA framework. It is divided into four frameworks:
Planning phase - the information technology is aligned with an organization's business strategies. This corresponds to the ISA framework's Business Scope layer.
First, strategic planning factors must be identified; they are the business goals, critical success factors and business problem areas.
Second, corporate planning objects must also be identified: organization units, locations, business functions and entity types.
Lastly, an enterprise model must be developed.
Analysis/Requirements Engineering Phase - the detailed specifications for the information system are developed. This maps to the Business Model and Information System Model layers of the ISA framework.
A conceptual model is built to capture the organization's structure of data, usually using Entity-Relationship Diagram or ERD.
A process model is constructed to provide a logical description of the processes performed by organizational functions and the flow of data between processes. Processes, which convert physical or data inputs to output, are extracted from decomposed business functions, and are modelled using Data Flow Diagrams, or DFD.
Design - the information system's target technologies are transformed from the conceptual and process model. The database design (both logical and physical) and the process design (the logic to be used) is created.
Implementation - here, the information system is constructed and installed according to the plans and designs. This includes coding, testing, and documentation.
Reference: Solamo, Ma. Rowena C. Database Systems. 2008.
Primary Keys, Foreign Keys, and Attributes of the Organic Shop Database
Primary Keys, Foreign Keys, and Attributes of the Organic Shop Database
Shown here (quite messily) are the nine tables of the Organic Shop database. These are, in alphabetical order: CONSULTANT, EMPLOYEE, HOURLY_EMPLOYEE, INVENTORY, ITEM, ORDER_DETAILS, ORDERS, SALARIED_EMPLOYEE, and STORE.
Each of these tables represent an entity type, which can be a person (EMPLOYEE, HOURLY_EMPLOYEE, SALARIED_EMPLOYEE, CONSULTANT), place (STORE), object (ITEM), event (ORDERS) or concept (ORDER_DETAILS, INVENTORY) in the user environment about which the organization wishes to maintain data. The reason that we must create these is that we should not overload a single table with too much information. Putting everything in a single table results to a less logical structure, and is a sure-fire way to get a low grade in CS 165. (Don't do it!)
For the next concept, here is the legend again for your convenience:
Legend
The Attributes of the entity types are shown in blue. Attributes are properties or characteristics of an entity that is of interest to the organization. If I own an organic shop (however unlikely that may be), I will be interested in knowing my employees' names, for example.
The Primary Keys are shown in orange (and sometimes yellow). These are attributes that have been selected as the unique identifier of an entity type.
Number is a primary key for five entity types, four relating to employee numbers (which may be compared to our student numbers), and one relating to a store number. The entity type ITEM has code as a primary key, which makes sense because we encounter barcodes on products on a daily basis. To uniquely determine an ORDER, instead of the customerid, who may order more than once, and instead of the orderdate, which may incur more than one order on the day, we create and assign an ordernbr to every element. As long as we will not change its value over time, this will not result to world destruction and your database will be fine.
The ORDER_DETAILS and the INVENTORY entity types show that when one does not suffice, it is possible and it is sometimes desirable to assign 2 (or more) primary keys. The idea is that an entity instance is unique if the combination of its primary key values is distinct from every other instance of the same type.
The items in green (and sometimes yellow) are the Foreign Keys. A foreign key is an attribute or group of attributes that is the primary key of another entity.
INVENTORY's item_code is a foreign key because it is the primary key of the entity type ITEM (whose primary key, you can tell is code). The astute reader will see that EMPLOYEE's manager is a foreign key because it is the primary key of EMPLOYEE itself!
Finally, as you can tell by the legend, having foreign keys as primary keys is perfectly fine and acceptable. :)
Entity-Relationship Modeling is a technique for defining the information needs of an organization. The focus is on the entities (important things in an organization), their attributes (or properties), and their relationships. The ER model presents these data logically and aids objective decisions unhindered by procedural constraints (such as storage and access methods, etc). ER Objects:
Entity- a person, place, object, event, or concept used in a business. A singular noun, enclosed in a rectangular box.
Entity Type/Entity Class - collections of entities with common properties.
Entity Instance - an occurrence of an entity type.
Attribute - properties or characteristics of an entity in connection with the organization. Enclosed in ellipses and connected to its associated entity with a line.
Candidate Key - an attribute or attributes that uniquely identifies an instance of an entity
Composite Key - a candidate key with more than one attribute
Primary Key - a candidate key that uniquely identifies an entity type. This attribute is chosen such that its value does not change over the lifetime of the entity instance. It must never be null and must always have a valid value (or values, for composite keys). May be assigned by the system or by the user. Underlined.
Foreign Key - attribute(s) that is/are the primary key/s of another entity. Dash-underlined.
Also, attributes may be derived (these are called derivative data). They may be multi-valued (contains more than one value per entity instance).
Relation - associations between entities. A verb (phrase) inside a diamond or along a connecting line.
Other ER Notes:
Degree of a relationship - defines how many entities participate in a relation.
Unary - relationship between instances of one entity type. A "recursive" relation.
Binary - between instances of two entity types.
Ternary - between instances of three entity types.
Cardinality - if A and B are related entities, the cardinality is the number of instances of B that is associated with each instance of A. The minimum/maximum cardinality is the min/max number of instances of B that may be related to A. If minimum cardinality is 0, B is optional to A; otherwise, it is mandatory.
Mandatory One - "one and only one" B is related to A. Modeled as two vertical lines.
Many - one or more B is related to A. Modeled as a vertical line next to "crow-feet" lines.
Optional 1 or 0 - either zero or one B is related to A. Modeled as a circle beside a vertical line.
Optional zero-many - zero or more B is related to A. Represented by a circle next to "crow-feet" lines.
Existence dependency - If A and B are related entities, and B cannot exist without A, then B has an existence dependency. B is called a weak entity . An identifying relationship exists if a parent class' primary key is used as part of the dependent entity's primary key.
Situation Analysis Situation - a well-defined set of circumstances that can be described using a sufficiently complete natural language. The entities, attributes and relationships can be discovered by looking at the nouns, verbs, and adjectives (respectively) in an interview transcript. Gerunds become many-to-many relationships. Multi-valued attributes and repeating groups can be modeled by using the "has" relation with one-to-many cardinality in an ERD. As a result of generalization (concept that some things are subtypes of other things) and categorization (concept that things come in various subtypes), the supertype-subtype relation is defined with an "is-a" relation. This relation is said to be exclusive if all the subtypes are mutually exclusive and all instances of the supertypes is categorized as one subtype; otherwise, if the subtypes overlap, the relation is nonexclusive. They may also be classified as exhaustive, if all subtypes are listed in the ERD, or nonexhaustive, if only some but not all subtypes are defined.
Reference: Solamo, Ma. Rowena C. Database Systems. 2008.
Group Progress The group had not yet acquired a suitable client for the database project by this week. A list of options included: a database for a flash game, a db software for a small healthcare company, or a db for a small loaning company. Shortly, it was decided that the loan database presented the least communication constraints and was most feasible. Please see the Design page for the group's interview transcript and ERD.
Week 6-7 Physical Database Design The Physical Database Design is the process of mapping the database structures from logical design into physical storage structures such as files and tables. Indexes are also specified as well as access methods and other physical factors. The major objective is to implement the database as a set of stored records, files, indexes and other data structures that will provide adequate performance and ensure database integrity, security and recoverability.
To specify the physical design of the tables, one will need to consider the following: 1. Business Rules or Integrity Constraints The term business rules are usually used in the context of the analysis phase while the term integrity constraints are used in the context of the design phase. Categorization of Integrity Constraints: · Domain Constraints It defines the set of all data types, allowable values (uniqueness, null support, e.g.), ranges of values, key type (primary key, foreign key, e.g.), and format and code design that attributes may assume. · Entity Integrity It also known as primary key constraint meaning that the base relation's primary key (whether single or composite) cannot be null. · Referential Integrity It defines the constraints that address the validity of references by one table in a database to some other table or tables in a database. It has two rules: insertion rule and deletion rule (restrict, nullify or cascade). 2. Data Volume and Usage Analysis When performing data volume analysis, estimates of the database size are used to select physical storage devices and estimate the cost of storage. When performing usage analysis, estimates of usage paths or patterns are used to select file organizations and access methods to plan for the use of indexes and to plan a strategy for data distribution.
3. Data Distribution Strategies 4. File Distribution and File Access Methods The physical record is the unit of transfer between disk and primary storage, and vice versa. It is also known as a block or page. It is a technique for physically rearranging the records of a file on a secondary storage device. · File Organization (Sequential and Hashed) It is the physical arrangement of data in a file into records and pages on secondary storage. · File Access Method (Sequential, Indexed, and Random-access or Direct-access) Defines the steps involved in storing and retrieving records from a file.
Journal
A weekly summary of the database concepts we learned at class.Go to Home, Design, or Implementation instead.
Week 1 - (November 23, 2010 and November 25, 2010) - Introduction to Database Systems
Information Resource Management (IRM) - the database environment
Data and information are resources as precious as time and money. IRM as a concept guides the use of a company's information similar in fashion to physical resources like manpower and finances.What is a database?
Data ≠ information.Data -> raw facts, with no inherent meaning. A name is just a noun until it is connected to a person, object or place, and then it becomes...
Information -> processed data suitable for human interpretation.
To convert data into useful information, after acquisition, the data is stored, manipulated, retrieved, and distributed according to need. To support this functionality, a database is employed. A database is a "shared collection of logically related data, designed to meet the information needs to multiple users in an organization." Two generic database architectures are used:
Information System Architecture (ISA) - developing a database system
The Information System Architecture is a framework that is the basis for storage, planning, development and use of information a business uses.The above table (as seen in "Database Systems") promptly summarizes the ISA framework. The columns, composed of Data, Processes, and Network, are the major components of the system. Data is the "what" of the information system: the entities and relationships. Processes are the steps that produce output from given data, representing the "how" of the system. The Network describes "where" data is stored and computations are performed.
The rows represent the architectural layers of an organization's information system, namely the business scope (an overview), business model (definition of entities and relationships), information system model (detailed business data, process flow, and network definitions), technology model (conversion from model to design), technology definition (conversion from model to actual statements that generate the actual information system), and the information system itself (manages, operates and uses the completed system).
Information Engineering Methodology (IEM)
A framework provides necessary steps up to completing a model. Methodology then steps in, presenting a series of steps to accomplish a design goal. Each methodology supports different modelling tools (e.g., Computer-Aided Software Engineering or CASE) and disciplines.IEM is one such formal methodology; it emphasizes a top-down, data-driven approach. It is also compatible with the ISA framework. It is divided into four frameworks:
Primary Keys, Foreign Keys, and Attributes of the Organic Shop Database
Source: 4.1 The Organic Shop Database Script.zipShown here (quite messily) are the nine tables of the Organic Shop database. These are, in alphabetical order: CONSULTANT, EMPLOYEE, HOURLY_EMPLOYEE, INVENTORY, ITEM, ORDER_DETAILS, ORDERS, SALARIED_EMPLOYEE, and STORE.
Each of these tables represent an entity type, which can be a person (EMPLOYEE, HOURLY_EMPLOYEE, SALARIED_EMPLOYEE, CONSULTANT), place (STORE), object (ITEM), event (ORDERS) or concept (ORDER_DETAILS, INVENTORY) in the user environment about which the organization wishes to maintain data. The reason that we must create these is that we should not overload a single table with too much information. Putting everything in a single table results to a less logical structure, and is a sure-fire way to get a low grade in CS 165. (Don't do it!)
For the next concept, here is the legend again for your convenience:
The Attributes of the entity types are shown in blue. Attributes are properties or characteristics of an entity that is of interest to the organization. If I own an organic shop (however unlikely that may be), I will be interested in knowing my employees' names, for example.
The Primary Keys are shown in orange (and sometimes yellow). These are attributes that have been selected as the unique identifier of an entity type.
Number is a primary key for five entity types, four relating to employee numbers (which may be compared to our student numbers), and one relating to a store number. The entity type ITEM has code as a primary key, which makes sense because we encounter barcodes on products on a daily basis. To uniquely determine an ORDER, instead of the customerid, who may order more than once, and instead of the orderdate, which may incur more than one order on the day, we create and assign an ordernbr to every element. As long as we will not change its value over time, this will not result to world destruction and your database will be fine.
The ORDER_DETAILS and the INVENTORY entity types show that when one does not suffice, it is possible and it is sometimes desirable to assign 2 (or more) primary keys. The idea is that an entity instance is unique if the combination of its primary key values is distinct from every other instance of the same type.
The items in green (and sometimes yellow) are the Foreign Keys. A foreign key is an attribute or group of attributes that is the primary key of another entity.
INVENTORY's item_code is a foreign key because it is the primary key of the entity type ITEM (whose primary key, you can tell is code). The astute reader will see that EMPLOYEE's manager is a foreign key because it is the primary key of EMPLOYEE itself!
Finally, as you can tell by the legend, having foreign keys as primary keys is perfectly fine and acceptable. :)
This concludes the section on primary keys, foreign keys, and attributes. For more information, read the JEDI Database Systems authored by Ma. Rowena C. Solamo.
Configuring JavaDB on Netbeans
This video is prepared by the group.
Week 2-3 - Introduction to Database Systems
Entity-Relationship Modeling is a technique for defining the information needs of an organization. The focus is on the entities (important things in an organization), their attributes (or properties), and their relationships. The ER model presents these data logically and aids objective decisions unhindered by procedural constraints (such as storage and access methods, etc).
ER Objects:
- Entity- a person, place, object, event, or concept used in a business. A singular noun, enclosed in a rectangular box.
- Entity Type/Entity Class - collections of entities with common properties.
- Entity Instance - an occurrence of an entity type.
- Attribute - properties or characteristics of an entity in connection with the organization. Enclosed in ellipses and connected to its associated entity with a line.
- Candidate Key - an attribute or attributes that uniquely identifies an instance of an entity
- Composite Key - a candidate key with more than one attribute
- Primary Key - a candidate key that uniquely identifies an entity type. This attribute is chosen such that its value does not change over the lifetime of the entity instance. It must never be null and must always have a valid value (or values, for composite keys). May be assigned by the system or by the user. Underlined.
- Foreign Key - attribute(s) that is/are the primary key/s of another entity. Dash-underlined.
Also, attributes may be derived (these are called derivative data). They may be multi-valued (contains more than one value per entity instance).- Relation - associations between entities. A verb (phrase) inside a diamond or along a connecting line.
Other ER Notes:- Degree of a relationship - defines how many entities participate in a relation.
- Unary - relationship between instances of one entity type. A "recursive" relation.
- Binary - between instances of two entity types.
- Ternary - between instances of three entity types.
- Cardinality - if A and B are related entities, the cardinality is the number of instances of B that is associated with each instance of A. The minimum/maximum cardinality is the min/max number of instances of B that may be related to A. If minimum cardinality is 0, B is optional to A; otherwise, it is mandatory.
- Mandatory One - "one and only one" B is related to A. Modeled as two vertical lines.
- Many - one or more B is related to A. Modeled as a vertical line next to "crow-feet" lines.
- Optional 1 or 0 - either zero or one B is related to A. Modeled as a circle beside a vertical line.
- Optional zero-many - zero or more B is related to A. Represented by a circle next to "crow-feet" lines.
- Existence dependency - If A and B are related entities, and B cannot exist without A, then B has an existence dependency. B is called a weak entity . An identifying relationship exists if a parent class' primary key is used as part of the dependent entity's primary key.
Situation AnalysisSituation - a well-defined set of circumstances that can be described using a sufficiently complete natural language. The entities, attributes and relationships can be discovered by looking at the nouns, verbs, and adjectives (respectively) in an interview transcript.
Gerunds become many-to-many relationships.
Multi-valued attributes and repeating groups can be modeled by using the "has" relation with one-to-many cardinality in an ERD.
As a result of generalization (concept that some things are subtypes of other things) and categorization (concept that things come in various subtypes), the supertype-subtype relation is defined with an "is-a" relation. This relation is said to be exclusive if all the subtypes are mutually exclusive and all instances of the supertypes is categorized as one subtype; otherwise, if the subtypes overlap, the relation is nonexclusive. They may also be classified as exhaustive, if all subtypes are listed in the ERD, or nonexhaustive, if only some but not all subtypes are defined.
Group Progress
The group had not yet acquired a suitable client for the database project by this week. A list of options included: a database for a flash game, a db software for a small healthcare company, or a db for a small loaning company. Shortly, it was decided that the loan database presented the least communication constraints and was most feasible.
Please see the Design page for the group's interview transcript and ERD.
Week 6-7 Physical Database Design
The Physical Database Design is the process of mapping the database structures from logical design into physical storage structures such as files and tables. Indexes are also specified as well as access methods and other physical factors. The major objective is to implement the database as a set of stored records, files, indexes and other data structures that will provide adequate performance and ensure database integrity, security and recoverability.
To specify the physical design of the tables, one will need to consider the following:
1. Business Rules or Integrity Constraints
The term business rules are usually used in the context of the analysis phase while the term integrity constraints are used in the context of the design phase.
Categorization of Integrity Constraints:
· Domain Constraints
It defines the set of all data types, allowable values (uniqueness, null support, e.g.), ranges of values, key type (primary key, foreign key, e.g.), and format and code design that attributes may assume.
· Entity Integrity
It also known as primary key constraint meaning that the base relation's primary key (whether single or composite) cannot be null.
· Referential Integrity
It defines the constraints that address the validity of references by one table in a database to some other table or tables in a database. It has two rules: insertion rule and deletion rule (restrict, nullify or cascade).
2. Data Volume and Usage Analysis
When performing data volume analysis, estimates of the database size are used to select physical storage devices and estimate the cost of storage. When performing usage analysis, estimates of usage paths or patterns are used to select file organizations and access methods to plan for the use of indexes and to plan a strategy for data distribution.
3. Data Distribution Strategies
4. File Distribution and File Access Methods
The physical record is the unit of transfer between disk and primary storage, and vice versa. It is also known as a block or page. It is a technique for physically rearranging the records of a file on a secondary storage device.
· File Organization (Sequential and Hashed)
It is the physical arrangement of data in a file into records and pages on secondary storage.
· File Access Method (Sequential, Indexed, and Random-access or Direct-access)
Defines the steps involved in storing and retrieving records from a file.
5. Indexes
6. Denormalization